Goto

Collaborating Authors

 low-rank factorization


The Price of Privacy for Low-rank Factorization

Neural Information Processing Systems

In this paper, we study what price one has to pay to release \emph{differentially private low-rank factorization} of a matrix. We consider various settings that are close to the real world applications of low-rank factorization: (i) the manner in which matrices are updated (row by row or in an arbitrary manner), (ii) whether matrices are distributed or not, and (iii) how the output is produced (once at the end of all updates, also known as \emph{one-shot algorithms} or continually). Even though these settings are well studied without privacy, surprisingly, there are no private algorithm for these settings (except when a matrix is updated row by row). We present the first set of differentially private algorithms for all these settings. Our algorithms when private matrix is updated in an arbitrary manner promise differential privacy with respect to two stronger privacy guarantees than previously studied, use space and time \emph{comparable} to the non-private algorithm, and achieve \emph{optimal accuracy}. To complement our positive results, we also prove that the space required by our algorithms is optimal up to logarithmic factors. When data matrices are distributed over multiple servers, we give a non-interactive differentially private algorithm with communication cost independent of dimension. In concise, we give algorithms that incur {\em optimal cost across all parameters of interest}. We also perform experiments to verify that all our algorithms perform well in practice and outperform the best known algorithm until now for large range of parameters.


Low-Rank Subspaces in GANs

Neural Information Processing Systems

The latent space of a Generative Adversarial Network (GAN) has been shown to encode rich semantics within some subspaces. To identify these subspaces, researchers typically analyze the statistical information from a collection of synthesized data, and the identified subspaces tend to control image attributes globally (i.e., manipulating an attribute causes the change of an entire image). By contrast, this work introduces low-rank subspaces that enable more precise control of GAN generation. Concretely, given an arbitrary image and a region of interest (e.g., eyes of face images), we manage to relate the latent space to the image region with the Jacobian matrix and then use low-rank factorization to discover steerable latent subspaces. There are three distinguishable strengths of our approach that can be aptly called LowRankGAN.


The Price of Privacy for Low-rank Factorization

Neural Information Processing Systems

In this paper, we study what price one has to pay to release \emph{differentially private low-rank factorization} of a matrix. We consider various settings that are close to the real world applications of low-rank factorization: (i) the manner in which matrices are updated (row by row or in an arbitrary manner), (ii) whether matrices are distributed or not, and (iii) how the output is produced (once at the end of all updates, also known as \emph{one-shot algorithms} or continually). Even though these settings are well studied without privacy, surprisingly, there are no private algorithm for these settings (except when a matrix is updated row by row). We present the first set of differentially private algorithms for all these settings. Our algorithms when private matrix is updated in an arbitrary manner promise differential privacy with respect to two stronger privacy guarantees than previously studied, use space and time \emph{comparable} to the non-private algorithm, and achieve \emph{optimal accuracy}. To complement our positive results, we also prove that the space required by our algorithms is optimal up to logarithmic factors. When data matrices are distributed over multiple servers, we give a non-interactive differentially private algorithm with communication cost independent of dimension. In concise, we give algorithms that incur {\em optimal cost across all parameters of interest}. We also perform experiments to verify that all our algorithms perform well in practice and outperform the best known algorithm until now for large range of parameters.



Energy Considerations for Large Pretrained Neural Networks

Mei, Leo, Stamp, Mark

arXiv.org Artificial Intelligence

Increasingly complex neural network architectures have achieved phenomenal performance. However, these complex models require massive computational resources that consume substantial amounts of electricity, which highlights the potential environmental impact of such models. Previous studies have demonstrated that substantial redundancies exist in large pre-trained models. However, previous work has primarily focused on compressing models while retaining comparable model performance, and the direct impact on electricity consumption appears to have received relatively little attention. By quantifying the energy usage associated with both uncompressed and compressed models, we investigate compression as a means of reducing electricity consumption. We consider nine different pre-trained models, ranging in size from 8M parameters to 138M parameters. To establish a baseline, we first train each model without compression and record the electricity usage and time required during training, along with other relevant statistics. We then apply three compression techniques: Steganographic capacity reduction, pruning, and low-rank factorization. In each of the resulting cases, we again measure the electricity usage, training time, model accuracy, and so on. We find that pruning and low-rank factorization offer no significant improvements with respect to energy usage or other related statistics, while steganographic capacity reduction provides major benefits in almost every case. We discuss the significance of these findings.


Low-Rank Subspaces in GANs

Neural Information Processing Systems

The latent space of a Generative Adversarial Network (GAN) has been shown to encode rich semantics within some subspaces. To identify these subspaces, researchers typically analyze the statistical information from a collection of synthesized data, and the identified subspaces tend to control image attributes globally (i.e., manipulating an attribute causes the change of an entire image). By contrast, this work introduces low-rank subspaces that enable more precise control of GAN generation. Concretely, given an arbitrary image and a region of interest (e.g., eyes of face images), we manage to relate the latent space to the image region with the Jacobian matrix and then use low-rank factorization to discover steerable latent subspaces. There are three distinguishable strengths of our approach that can be aptly called LowRankGAN.


Lossless Model Compression via Joint Low-Rank Factorization Optimization

Zhang, Boyang, Cheng, Daning, Zhang, Yunquan, Liu, Fangmin, Tian, Jiake

arXiv.org Artificial Intelligence

Low-rank factorization is a popular model compression technique that minimizes the error $\delta$ between approximated and original weight matrices. Despite achieving performances close to the original models when $\delta$ is optimized, a performance discrepancy remains due to the separate optimization processes for low-rank factorization and model performance, resulting in unavoidable losses. We address this issue by introducing a novel joint optimization strategy for lossless low-rank weight factorization, which, for the first time, enhances the model's performance beyond the original. Our approach begins with a theoretical analysis of the relationship between low-rank factorization and model optimization objectives, establishing a precise perturbation range for matrix factorization errors on model performance. This challenge is then reformulated as a numerical rank deficiency problem with inequality constraints and develop a joint objective that simultaneously addresses factorization error and model performance. Based on the above analysis, we propose two optimization algorithms: \textbf{a lossless optimization algorithm} that maximizes model accuracy while ensuring compression, and \textbf{a compact optimization algorithm} that minimizes model size while preserving performance. These algorithms do not require fine-tuning and can directly compress numerous deep models to achieve lossless results. Our methods demonstrate robust efficacy across various vision and language tasks. For example, the compressed model reduced by 70\% on ResNext50 outperforms the original. Our code will be made public.


On Speeding Up Language Model Evaluation

Zhou, Jin Peng, Belardi, Christian K., Wu, Ruihan, Zhang, Travis, Gomes, Carla P., Sun, Wen, Weinberger, Kilian Q.

arXiv.org Artificial Intelligence

Large language models (LLMs) currently dominate the field of natural language processing (NLP), representing the state-of-the-art across a diverse array of tasks. Developing a model of this nature, from training to inference, requires making numerous decisions which define a combinatorial search problem. For example, selecting the optimal pre-trained LLM, prompt, or hyperparameters to attain the best performance for a task often requires evaluating multiple candidates on an entire test set. This exhaustive evaluation can be time-consuming and costly, as both inference and metric computation with LLMs are resource-intensive. In this paper, we address the challenge of identifying the best method within a limited budget for evaluating methods on test examples. By leveraging the well-studied multi-armed bandit framework, which sequentially selects the next method-example pair to evaluate, our approach, combining multi-armed bandit algorithms with low-rank factorization, significantly reduces the required resources. Experiments show that our algorithms can identify the top-performing method using only 5-15\% of the typically needed resources, resulting in an 85-95\% reduction in cost.


An Empirical Investigation of Matrix Factorization Methods for Pre-trained Transformers

Gupta, Ashim, Saravani, Sina Mahdipour, Sadayappan, P., Srikumar, Vivek

arXiv.org Artificial Intelligence

The increasing size of transformer-based models in NLP makes the question of compressing them important. In this work, we present a comprehensive analysis of factorization based model compression techniques. Specifically, we focus on comparing straightforward low-rank factorization against the recently introduced Monarch factorization, which exhibits impressive performance preservation on the GLUE benchmark. To mitigate stability issues associated with low-rank factorization of the matrices in pre-trained transformers, we introduce a staged factorization approach wherein layers are factorized one by one instead of being factorized simultaneously. Through this strategy we significantly enhance the stability and reliability of the compression process. Further, we introduce a simple block-wise low-rank factorization method, which has a close relationship to Monarch factorization. Our experiments lead to the surprising conclusion that straightforward low-rank factorization consistently outperforms Monarch factorization across both different compression ratios and six different text classification tasks.

  factorization, low-rank factorization, matrix, (16 more...)
2406.11307

HyperBandit: Contextual Bandit with Hypernewtork for Time-Varying User Preferences in Streaming Recommendation

Shen, Chenglei, Zhang, Xiao, Wei, Wei, Xu, Jun

arXiv.org Artificial Intelligence

In real-world streaming recommender systems, user preferences often dynamically change over time (e.g., a user may have different preferences during weekdays and weekends). Existing bandit-based streaming recommendation models only consider time as a timestamp, without explicitly modeling the relationship between time variables and time-varying user preferences. This leads to recommendation models that cannot quickly adapt to dynamic scenarios. To address this issue, we propose a contextual bandit approach using hypernetwork, called HyperBandit, which takes time features as input and dynamically adjusts the recommendation model for time-varying user preferences. Specifically, HyperBandit maintains a neural network capable of generating the parameters for estimating time-varying rewards, taking into account the correlation between time features and user preferences. Using the estimated time-varying rewards, a bandit policy is employed to make online recommendations by learning the latent item contexts. To meet the real-time requirements in streaming recommendation scenarios, we have verified the existence of a low-rank structure in the parameter matrix and utilize low-rank factorization for efficient training. Theoretically, we demonstrate a sublinear regret upper bound against the best policy. Extensive experiments on real-world datasets show that the proposed HyperBandit consistently outperforms the state-of-the-art baselines in terms of accumulated rewards.